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Abstract 

We show that Kolmogorov complexity and such its estimators as uni- 
versal codes (or data compression methods) can be applied for hypotheses 
testing in a framework of classical mathematical statistics. The methods 
for identity testing and nonparametric testing of serial independence for 
time series are suggested. 
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1 Introduction. 

The Kolmogorov complexity, or algorithmic entropy, was suggested in jjj and 
was investigated in numerous papers; see for review jS]. Now this notation plays 
important role in theory of algorithms, information theory, artificial intelligence 
and many other fields and is closely connected with such deep theoretical issues 
as definition of randomness, logical basis of probability theory, randomness and 
complexity (see 0111 EH El El© 20 ). In this paper we show that Kolmogorov 
complexity can be applied to hypotheses testing in framework of mathematical 
statistics. Moreover, we suggest using universal codes (or methods of data 
compression), which are estimations of Kolmogorov complexity, for testing. 

•Research was supported by the joint project grant "Efficient randomness testing of random 
and pseudorandom number generators" of Royal Society, UK (grant ref: 15995) and Russian 
Foundation for Basic Research (grant no. 03-01-00495.) 



In this paper we consider a stationary and ergodic source (or process), which 
generates elements from a finite set (or alphabet) A and two problems of sta- 
tistical testing. The first problem is the identity testing, which is described as 
follows: a hypotheses HQ d is that the source has a particular distribution tt and 
the alternative hypothesis H{ d that the sequence is generated by a stationary 
and ergodic source, which differs from the source under H{f . One particular case 
where the source alphabet A — {0, 1} and the main hypothesis Hjf is that a bit 
sequence is generated by the Bernoulli source with equal probabilities of 0's and 
l's, is applied to the randomness testing of random number and pseudorandom 
number generators. 

The second problem is a generalization of the problem of nonparametric test- 
ing for independence of time series. More precisely, we consider two following 
hypotheses: H'^ Ld is that the source is Markovian, which memory (or connec- 
tivity) is not larger than to, (to > 0), and the alternative hypothesis H[ nd that 
the sequence is generated by a stationary and ergodic source, which differs from 
the source under Hq. In particular, if m = 0, this is the problem of testing 
for independence of time series. This problem is well known in mathemati- 
cal statistics and there is an extensive literature dealing with nonparametric 
independence testing. 

In both cases the testing should be based on a sample x x . . . x t generated by 
the source. 

We suggest statistical tests for identity testing and nonparametric testing of 
serial independence for time series, which are based on Kolmogorov complexity 
and such estimates of it as universal codes. It is important to note that prac- 
tically used so-called archivers can be used for suggested testing, because they 
can be considered as methods for estimation of Kolmogorov complexity. 

The outline of the paper is as follows. The next part contains definitions 
and necessary information. The parts three and four are devoted to the identity 
testing and testing of serial independence, correspondingly. The fifth part con- 
tains results of experiments, where the suggested method of identity testing is 
applied to pseudorandom number generators. All proofs are given in Appendix. 

2 Definitions and Preliminaries. 

First wc define stochastic processes (or sources of information). Consider an 
alphabet A = {ai, • • • , a n } with n > 2 letters and denote by A 1 and A* the set of 
all words of length t over A and the set of all finite words over A, correspondingly 
(A* = [J°^ 1 A 1 ). Let n be a source which generates letters from A. Formally, fi is 
a probability distribution on the set of words of infinite length or, more simply, 
/i = (^*)t>i is a consistent set of probabilities over the sets A* ; t > 1. By 
Moo (A) we denote the set of all stationary and ergodic sources, which generate 
letters from A. Let Mk(A) C M 00 (A) be the set of Markov sources with memory 
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(or connectivity) fc, k > 0. More precisely, by definition /i G Affc(A) if 

(J-(x t +i = aijx t = a l2 .x t -i = a l3 , ... ,x t _ fc+ i = a, ifc+1 , ...) 

= ^(x t+ i = a n /x t = a i2 ,x t -i = a l3 , ... ,X t -k+i = a ik+1 ) (1) 

for all t > k and a^, ai„, . . . G A. By definition, Mq(A) is the set of all Bernoulli 
(or i.i.d.) sources over A and M*(A) = {J^ Mi(A) is the set of all finite- 
memory sources. 

Now we define codes and the Kolmogorov complexity. Let ^4°° be the set of 
all infinite words X1X2 ■ ■ ■ over the alphabet A. A data compression method (or 
code) (p is defined as a set of mappings <p n such that (f„ : A n — > {0, 1}*, n — 
1,2,... and for each pair of different words x, y G A n ip n (x) ^ Pniy)- Informally, 
it means that the code (p can be applied for compression of each message of any 
length n over alphabet A and the message can be decoded if its code is known. 
It is also required that each sequence ip n (ui)<p n (u2)-..(p n (ur), r > 1, of encoded 
words from the set A n ,n > 1, can be uniquely decoded into u\U2-..u r . Such 
codes are called uniquely decodable. For example, let A — {a, b}, the code 
tpi(a) — O,-0i(&) = 00, obviously, is not uniquely decodable. It is well known 
that if a code <p is uniquely decodable then the lengths of the codewords satisfy 
the following inequality (Kraft inequality): Suga™ 2~' ¥ '"' tl ^ < 1 , see, for ex., 
0. (Here and below \v\ is the length of v, if v is a word and the number of 
elements of v if v is a set.) It will be convenient to reformulate this property as 
follows: 

Claim 1. Let (p be a uniquely decodable code over an alphabet A. Then 
for any integer n there exists a measure \x v on A n such that 

\<p{u)\ > -log/v(u) (2) 

for any u from A n . (Here and below log = log 2 .) 
(Obviously, the claim is true for the measure 

= 2-l^)l/S ue ^ 2-l<"WI). 

In this paper we will use the so-called prefix Kolmogorov complexity, whose 
precise definition can be found in (3J [S]. Its main properties can be described 
as follows. There exists a uniquely decodable code k such that i) there is an 
algorithm of decoding (i.e. there is a Turing machine, which maps k(u) to u 
for any u G A*) and ii) for any uniquely decodable code ip, whose decoding is 
algorithmically realizable, there exists a constant C^, that 

|/s(u)| - < C,/> ( 3 ) 

for any u G A* . The prefix Kolmogorov complexity K(u) is defined as the length 
of k(u): K(u) = \k(u)\. The code n is not unique, but the second property means 
that codelengths of two codes k\ and K2, for which i) and ii) is true, are equal 
up to a constant: | |ki(m)| — |«2(w)| | < C1.2 for any word u (and the constant 
6*1,2 does not depend on u, see (JSJl.) So, K(u) is defined up to a constant. 
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In what follows we call this value "Kolmogorov complexity" and uniquely 
decodable codes just "codes". 

We can see from ii) that the code k is asymptotically (up to the constant) 
the best method of data compression, but it turns out that there is no algo- 
rithm that can calculate the codeword k(u) (and even K («)). That is why 
the code k (and Kolmogorov complexity) cannot be used for practical data 
compression directly. On the other hand, so-called universal codes can be re- 
alized and, in a certain sense, can be used instead of the optimal code k, if 
they are applied for compression of sequences generated by any stationary and 
ergodic source. For their description we recall that (as it is known in Informa- 
tion Theory) sequences x\...Xt, generated by a source p, can be "compressed" 
till the length — \ogp{x\...Xt) bits and, on the other hand, there is no code 
ip for which the average codeword length ( Y> Xl ... Xt ^A t p(xi--.Xt)\ip(xi...Xt)\ ) is 
less than ~Y, Xl ,,, Xte A t p{xi...xt)\ogp(x\...xt). The universal codes can reach 
the lower bound — \ogp(x\...xt) asymptotically for any stationary and ergodic 
source p with probability 1. The formal definition is as follows: A code ip is 
universal if for any stationary and ergodic source p 

lim t~ 1 (-\ogp(xi...x t ) - \<p(xi...xt)\) = (4) 

t^oo 

with probability 1. So, informally speaking, universal codes estimate the prob- 
ability characteristics of the source p and use them for efficient " compression" . 
One of the first universal codes was described in JJJ i see a ls° Q5J ■ Now there are 
many efficient universal codes (and universal predictors connected with them), 
which are described in numerous papers, see [31 151 1121 1131 116| . 



3 Identity Testing. 

Now we consider the problem of testing Hlf against H\ d . Let the required level 
of significance (or a Type I error) be q, a £ (0, 1). (By definition, the Type I 
error occurs if Hq is true, but the test rejects Hq). We describe a statistical test 
which can be constructed based on any code ip. 

The main idea of the suggested test is quite natural: compress a sample 
sequence x\...x n by a code <p. If the length of codeword (\(p(xi...x n )\) is signif- 
icantly less than the value — log 7r(xi. ..£„), then Hlf should be rejected. The 
main observation is that the probability of all rejected sequences is quite small 
for any ip, that is why the Type I error can be made small. The precise descrip- 
tion of the test is as follows: The hypothesis H^ d is accepted if 

-log7r(xi...a;„) - \<p(xi...x n )\ < -log a. (5) 

Otherwise, HQ d is rejected. We denote this test by T^a t(p . 
Theorem 1. 

i) For each distribution tt, a G (0,1) and a code ip, the Type I error of the 
described test T^% tV is not larger than a. 
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ii) If, in addition, ir is a finite-memory stationary and ergodic process (i.e. 
7r e M*(A) ) and ip is a universal code, then the Type II error of the test T^a tV 
goes to 0, when n tends to infinity. 

Remark. The Kolmogorov complexity can be used instead of the length of 
a code. Namely, let be the following test: the hypothesis H^f is accepted if 

— log 7T (2; !...£„) — K{x\...x n ) < — log a, otherwise, H^f is rejected. Theorem 1 
is valid for this test, too. 

4 Testing of Serial Independence 

We first give some additional definitions. Let v be a word v — v\...Vk,k < 
t,Vi G A. Denote the rate of a word v occurring in the sequence x\Xi ■ ■ • 
x 2 x 3 . . .Xk+i, x 3 X4 . . .x k+2 , ■ ■ ., Xt-k+i ■ ■ -x t as For example, if x\...x t = 

000100 and v = 00, then v 6 (00) = 3. Now we define for any k > the so-called 
empirical Shannon entropy of order k as follows: 

h * k{xi ... Xt ) = -—L- Yl v\v)Y J {v\va)/v t {v))\og{v\va)/v t {v)), (6) 

^ ' vGA k a<EA 

where k < t and = J2 a eA ^*( ua )- In particular, if k = 0, we obtain 

h* Q ( Xl ...x t ) = -jEaeA^( a ) Iog(v*(o)/t), 

Let, as before, H™ be that the source tt is Markovian with memory (or 
connectivity) not grater than m, (m > 0), and the alternative hypothesis Hl nd 
be that the sequence is generated by a stationary and ergodic source, which 
differs from the source under H™ d . The suggested test is as follows. 

Let tp be any code. By definition, the hypothesis H™ d is accepted if 

(t - m)h* m (x 1 ...x t ) - IV'Oi...^)! < log(l/a) , (7) 

where a e (0, 1). Otherwise, H^ nd is rejected. We denote this test by j, m - 
Theorem 2. i) For any distribution n and any code ip the First Type error 

of the test ^ m is less than or equal to a,a£ (0,1). 

ii) If, in addition, 7r is a stationary and ergodic process over A°° and ip is 

a universal code, then the Type II error of the test ^ m goes to 0, when t 

tends to infinity. 

Comment. If we use Kolmogorov complexity K(x\...x n ) instead of the 
length of the code \ip(xi...xt)\, the obtained test will have the same properties. 

5 Experiments 

We applied the described method of identity testing to pseudorandom number 
generators. More precisely, we denote by U a source, which generates equiprob- 
able and independent symbols from the alphabet {0,1} and consider the hy- 
pothesis H{f that a sequence is generated by U. 
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We have taken linear congruent generators (LCG), which are defined by the 
following equality 

X n+ i = (A* X n + C) modM, 

where X n is the n-th generated number [S] . Each such generator we will denote 
by LCG(M, A,C, X ), where X is the initial value of the generator. Such 
generators are well studied and many of them are used in practice, see UJ. 

In our experiments we extract an eight-bit word from each generated Xi 
using the following algorithm. Firstly, the number /j, = IM/256J was calculated 
and then each Xi was transformed into an 8-bit word JQ as follows: 



Xi = [Xi/256\ ifXi < 256/i 
Xi = empty word ifXi > 256/1 



(8) 



Then a sequence was compressed by the archiver ACE v 1.2b (see http://www.winace.com/) 
Experimental data about testing of four linear congruent generators is given in 
the table. 



Table 1 : Results of experiments 



parameters / length (bits) 


400 000 


8 000 000 


M,A,C, X Q 






10 8 + 1,23,0,47594118 


390 240 


7635936 


2 31 ,2 16 + 3,0,1 


extended 


7797984 


2 32 , 134 7 7 58 1 3, 1,0 


extended 


extended 


2 32 , 69069, 0,1 


extended 


extended 



So, we can see from the first line of the table that the 400000— bit sequence 
generated by the LCG(10 8 + 1,23,0,47594118) and transformed according to 
||SJ|, was compressed to a 390240— bit sequence. (Here 400000 is the length of the 
sequence after transformation.) If we take the level of significance a > 2 -9760 
and apply the test „ ^°°' ) ,(<p = ACE v 1.2b), the hypothesis Hq should be 
rejected, see Theorem 1 and (jSJ). Analogously, the second line of the table shows 
that the 8000000-bit sequence generated by LCG(2 31 ,2 16 + 3,0,1) cannot be 
considered as random. (H^f should be rejected if the level of significance a is 
greater than 2 -202016 .) On the other hand, the suggested test accepts H^ d for 
the sequences generated by the two latter generators, because the lengths of the 
"compressed" sequences increased. 

The obtained information corresponds to the known data about the gener- 
ators mentioned above. Thus, it is shown in [6] that the first two generators 
are bad whereas the last two generators were investigated in and [Jj], corre- 
spondingly, and are regarded as good. So, we can see that the suggested testing 
is quite efficient. 
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6 Appendix. 



The following well known inequality, whose proof can be found in 2 , will be 
used in proofs of both theorems. 

Lemma. Let p and q be two probability distributions over some alphabet 
B. Then ^2 beB p(b) l°s(p(&) /<?(&)) — with equality if and only if p = q. 

(n) 

Proof of Theorem 1. Let C a be a critical set of the test T^a^, i.e., by 
definition, C a — {u : u £ A 1 & — log7r(w) — \<p{u)\ > — logo?}. Let fi v be a 
measure for which the claim 1 is true. We define an axillary set 

C a = {u : -log7r(n) - (- log fj, v (u)) > - log a}. 

We have 

uec a uedc 

(Here the second inequality follows from the definition of C a , whereas all others 
are obvious.) So, we obtain that Tr(C a ) < a. From definitions of C a , C a and J5J 
we immediately obtain that C a D C' a . Thus, ir(C a ) < a. By definition, ir(C a ) 
is the value of the Type I error. The first statement of the theorem 1 is proven. 

Let us prove the second statement of the theorem. Suppose that the hypoth- 
esis H\ d is true. That is, the sequence x± . . . Xt is generated by some stationary 
and ergodic source r and r ^ tt. Our strategy is to show that 

lim -log7r(a; 1 ...x t )- \<p(x\ ...x t )\ = oo (9) 

f— >oo 

with probability 1 (according to the measure r). First we represent JHJ as 

- log7r(x! ...x t )- \<p(xi ...x t )\ 

= *t log r + - - logr(a;i ...x t )- \ f{x 1 . . . x t ) )). 

t it{xi . . . x t ) t 

From this equality and the property of a universal code 10} we obtain 

- logTrOrj ...x t )- \<p( Xl ...x t )\=t{- + log T ^--- Xt \ + (i)). (io) 

Now we use some results of the ergodic theory and the information theory, which 
can be found, for ex., in £Q. Firstly, according to the Shannon-MacMillan- 
Breiman theorem, there exists the limit lim^oo — logr(xi . . . Xt)/t (with prob- 
ability 1) and this limit is equal to the so-called limit Shannon entropy, which 
we denote as hoo(T). Secondly, it is known that for any integer k the following 
inequality is true: feoo(r) < - J2v<£A k T ( v ) J2 a <£A T (a/v) log r(a/v). (Here the 
right hand value is called m— order conditional entropy). It will be convenient 
to represent both statements as follows: 

lim — logr(a;i . . .xt)/t < — 7 t(v) 7 ria/v) log ria/v) (11) 

V £A k aeA 
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for any k > (with probability 1). It is supposed that the process ir has a 
finite memory, i.e. belongs to M S (A) for some s. Having taken into account the 
definition of M S (A) QJ, we obtain the following representation: 

t 

- log7r(xi . . . x t )/t = 2J log7r(xi/a;i . . . x,_i) 

i=i 

k t 

= -t" 1 (^log7r(x i /a;i . . + log7r(xi/xi_ fc . . . Xj_i)) 

i=l i=fc+l 

for any k > s. According to the ergodic theorem there exists a limit 

t 

lim t" 1 log7r(x,/xi_fc . . . Xj_i), 

£^oo * — » 

which is equal to — J2veA k T ( v ) J2aeA T i a / V ) l°g 7r («/v), see So, from the 

two latter equalities we can see that 

lim (— log7r(xi . . . x t ))/t = — y r(v) ) T(a/v)logir(a/v). 

veA k aeA 

Taking into account this equality, ^1 If) and JTDJ, we can see that 

-log7r(xi . . .x t )-\<p(xi . ..x t )\ > t ( t(v) r(a/v) \og(T(a/v)/n(a/v)))+o(t) 

veA k aeA 

for any k > s. From this inequality and the Lemma we can obtain that 
— log7r(xi . . . x t ) — \tp{x\ . . . x t )\ > c t + o(t), where c is a positive constant, 
t — > 00. Hence, (0 is true and the theorem is proven. 

Proof of Theorem 2. First we show that for any source 9* £ Mq (A) and any 
word xx . . .Xt £ A* , t > 1, the following inequality is valid: 

e*(x 1 ...x t )=Y[(e*(a)f^ < l[(S{a)/t) vt W (12) 

aeA aeA 

Here the equality holds, because 9* S Mo (A) . The inequality follows from the 
Lemma. Indeed, ifp(a) = v* (a)/t and q(a) = 9* (a), then^ a£ ^ v ^ log gffis^ > 
0. From the latter inequality we obtain (|12fl . 

Let now belong to M m (A), m > 0. We will prove that for any x\ . . . Xt 

e( X1 ... Xt )< n n ( yt m/^ ( u )) vt {ua) - ( i3 ) 

ueA m aeA 

Indeed, we can present 9{x\ . . . Xt) as 

9(x 1 ...x t ) = 9(x 1 ...x m ) [] II 0(aM" t{ua) > 

ueA m aeA 
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where 6{x\ . . . x m ) is the limit probability of the word x± . . . x m . Hence, 6{x\ . . . Xt) < 
Y\ ueAm Y\ ae A S ( a / u )" t{ua} ■ Taking into account the inequality ltT2l . we obtain 

Y[ 9{a/uy t{ua) < Y[(v t (ua)/v t (u)) ,/t(ua) 

a£A a£A 

for any word u. So, from the last two inequalities we obtain (|13fl . 

It will be convenient to define two auxiliary measures on A* as follows: 

n m ( Xl ...x t ) = A 2- th ^-^) , a { Xl ...x t ) = 2 -^(— *)\ (14) 

where Xl ...x t € A* and A = (Ex!...x t eA* 2 ~* ^ (xi - Xt) ) _1 • If we take into 
account that 2-(*" m ) COi-**) = OugA™ naeA(^( ua )/^( w ))' yt(tia) . we can 
see from (|13|) and i|14|) that, for any measure 9 6 M ro (4) and any cci . . . Xt € A*, 

9{x\ ■ . -x t ) < 7r m (xi...x t )/A . (15) 

Let us denote the critical set of the test Y*, am as C a , i.e., by definition, C a = 
{xi ...Xt : (t — m)h^ n (xi . . . x t ) — \ip{xi...x t )\) > log(l/a)}. From we obtain 

C a = {xi...x t : (t- m) h* m (xi ...x t )- {-\oga{xi...x t )) ) > log(l/a)}. (16) 
From l|15|l and (|16[) we can see that for any measure 9 G M m (A) 

9(C a )<n m (C a )/A. (17) 
From H16[l and l|14|l we obtain 

C a = {xi...x t : 2 > ( Q . . . Xt ))-^} 

= {xx...x t : {n m {xi . . . x t )/Ay x > (a a(xx . . . Xt))^ 1 } . 

Finally, 

C a = {xi...x t : a(xi . . .x t ) > Tr m (xi . . .X()/(aA)}. (18) 
The following chain of inequalities and equalities is valid: 

1> ^ cr(xx...x t )> ^ Km(xi---x t )/(aA) 

X 1 ...X t £C a X 1 ...X t €:C a 

= n m (C a )/{aA) > 9(C a )A/(a A) = 9(C a )/a. 

(Here both equalities and the first inequality are obvious, the second and the 
third inequalities follow from l|18|) and l|17fl . correspondingly.) So, we obtain 
that 9(C a ) < a for any measure 9 £ M m {A). Taking into account that C a is 
the critical set of the test, we can see that the probability of the First Type 
error is not greater than a. The first claim of the theorem is proven. 
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The proof of the second statement of the theorem will be based on some 
results of Information Theory. The t— order conditional Shannon entropy is 
defined as follows: 

ht(p) = - ^ P{xi—x t ) } J p(a/xi...x t ) logp(a/x-j...x t ), (19) 

Xl.-.XteA' aEA 

where p <E M m (A). It is known that for any p e M^A) firstly, log|A| > 
ho{p) > hi(p) > * * * 3 secondly, there exists limit Shannon entropy h^ip) = 
lim^oo h t (jp), thirdly, lim^oo -t' 1 logp(x 1 ...x t ) = h oc (p) with probability 1 
and, finally, h m (p) is strictly greater than h oc (p), if the memory of p is grater 
than to, (i.e. p € M 00 (A) \ M m (A)), see, for example, P El- 
Taking into account the definition of the universal code we obtain from 
the above described properties of the entropy that 

lmt- 1 |#Ei...a:t)| = &oo(p) (20) 

t — >oo 

with probability 1. It can be seen from © that h* m is an estimate for the 
m— order Shannon entropy 119fl . Applying the ergodic theorem we obtain 
linit-foo h^xx . . . xi) = h m (p) with probability 1; see P Having taken 
into account that h m {p) > h oc (p) and l|20|) we obtain from the last equality 
that lim t ^ 00 ((t — m) h* n (xi . . .Xt) — \ip(xi...x t )\) = oo. This proves the second 
statement of the theorem. 
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